Integrated excercise

Background behind the dataset

This dataset is a simulated dataset that is based on an existing study of Frumuselu et al. (2015). In this study, the key question was whether subtitles help in foreign language acquisition. Spanish students (n = 36) watched episodes of the popular tv-show “Friends” for half an hour each week, during 26 weeks. The students were assigned to 3 conditions:

  • English subtitled (condition “FL”)

  • Spanish subtitled (condition “MT”)

  • No subtitles (condition “NoSub”)

At 3 occasions students got a Fluency test:

  • Before the 26 weeks started

  • After 12 weeks

  • After the experiment

The dependent variable is a measure based on the number of words used in a scripted spontaneous interview with a test taker. The data is structured as follows:

Code
load(file = "Subtitles.RData")
head(Subtitles, 9)
  student occasion condition fluency
1       1     Occ1        FL  101.25
2       1     Occ2        FL  103.76
3       1     Occ3        FL  117.39
4       2     Occ1        MT   98.79
5       2     Occ2        MT  106.75
6       2     Occ3        MT  110.54
7       3     Occ1     NoSub  104.83
8       3     Occ2     NoSub  102.04
9       3     Occ3     NoSub  100.63

If we visualize the dataset we get a first impression of the effect of the condition. In this exercise it is your task to do the proper Bayesian modelling and interpretation.

Code
library(tidyverse)

theme_set(theme_linedraw() +
            theme(text = element_text(family = "Times", size = 10),
                  panel.grid = element_blank())
)

Subtitles %>%
  ggplot(
    aes(
      x = occasion,
      y = fluency,
      group = student
      )
  ) +
  geom_path(
    aes(
      color = condition
    )
  ) 



1. Task 1

First we will start by building 4 alternative mixed effects models. In each of the models we have to take into account the fact that we have multiple observations per student. So we need a random effect for students in the model.

  • M0: only an intercept and a random effect for student

  • M1: M0 + fixed effect of occasion

  • M2: M1 + fixed effect of condition

  • M3: M2 + interaction effect between occasion and condition

Make use of the default priors of brms.

Once the models are estimated, compare the models on their fit making use of the leave-one-out cross-validation. Determine which model fits best and will be used to build our inferences.


Solution

The following code-block shows how the models can be estimated and compared on their fit. Notice how I first create a set of dummy-variables for the categories of the occasion and the condition variables. This makes it a bit harder (more code writing) to define my model but it generates some flexibility later on. For instance, when thinking about setting priors, I can set priors for each of the effects of these dummy variables. Also, it will result in shorter names of parameters in my model in the resulting objects for the model.

Code
library(brms)
library(tidyverse)

Subtitles <- Subtitles %>%
  mutate(

    # dummy for Occ2
    Occ2 = case_when(
      occasion == "Occ2" ~ 1,
      occasion == "Occ1" ~ 0,
      occasion == "Occ3" ~ 0,
    ),
    
    # dummy for Occ3
    Occ3 = case_when(
      occasion == "Occ3" ~ 1,
      occasion == "Occ1" ~ 0,
      occasion == "Occ2" ~ 0,
    ),
    
    # dummy for FL condition
    FL = case_when(
      condition == "FL" ~ 1,
      condition == "MT" ~ 0,
      condition == "NoSub" ~ 0,
    ),
    
    # dummy for MT
    MT = case_when(
      condition == "MT" ~ 1,
      condition == "FL" ~ 0,
      condition == "NoSub" ~ 0,
    )
  )

# Estimate the models

M0 <- brm(
  fluency ~ 1 + (1|student),
  data = Subtitles,
  cores = 4,
  backend = "cmdstanr",
  seed = 1975
)

M1 <- brm(
  fluency ~ 1 + Occ2 + Occ3 + (1|student),
  data = Subtitles,
  cores = 4,
  backend = "cmdstanr",
  seed = 1975
)

M2 <- brm(
  fluency ~ 1 + Occ2 + Occ3 + FL + MT + (1|student),
  data = Subtitles,
  cores = 4,
  backend = "cmdstanr",
  seed = 1975
)

M3 <- brm(
  fluency ~ 1 + Occ2*FL + Occ2*MT + Occ3*FL + Occ3*MT + (1|student),
  data = Subtitles,
  cores = 4,
  backend = "cmdstanr",
  seed = 1975
)

# loo cross-validation of the models

loo_M0 <- loo(M0)
loo_M1 <- loo(M1)
loo_M2 <- loo(M2)
loo_M3 <- loo(M3)

loo_models <- loo_compare(
  loo_M0,
  loo_M1,
  loo_M2,
  loo_M3
)

print(loo_models, simplify = F)
Loading required package: Rcpp
Loading 'brms' package (version 2.18.0). Useful instructions
can be found by typing help('brms'). A more detailed introduction
to the package is available through vignette('brms_overview').

Attaching package: 'brms'
The following object is masked from 'package:stats':

    ar
   elpd_diff se_diff elpd_loo se_elpd_loo p_loo  se_p_loo looic  se_looic
M3    0.0       0.0  -294.0      7.7        28.9    3.5    588.1   15.4  
M2  -35.9       7.0  -330.0      8.4        12.3    1.9    659.9   16.7  
M1  -42.6       8.1  -336.6      9.7        19.9    3.3    673.2   19.4  
M0  -73.5       8.5  -367.6      8.6         6.0    1.1    735.2   17.2  

Based on the model comparison we can conclude that the final model (M3) fits the data best. We will use this model in the next sections to build our inferences.

Note

When doing the loo comparison you might encounter a warning message saying that there is one or more observations showing a Pareto K higher than .7. What this means is that the leave-one-out cross-validation might be biased. The warning message suggest ‘moment matching’ as potential solution. You could try this. In my experience, the impact of one or two observations suffering a Pareto K value that is too high is rather small or even negligible. But it is always better to double check. Also, be aware that the moment matching solution creates a very slow estimation of the loo!


2. Task 2

Now that we have established the best fitting model, it is our task to approach the model critically before delving into the interpretation of the results.

Apply the different steps of the WAMBS check-list template for the final model.

Subtask 2.1: what about the priors?

What are the default brms priors? Do they make sense? Do they generate impossible datasets? If necessary, specify your own (weakly informative) priors and approach the critically as well.


Potential Solution

Let’s start with a prior predictive check.

Code
M3_priors <- brm(
  fluency ~ 1 + Occ2*FL + Occ2*MT + Occ3*FL + Occ3*MT + (1|student),
  data = Subtitles,
  cores = 4,
  backend = "cmdstanr",
  seed = 1975,
  sample_prior = "only"
)

pp_check(M3_priors)

As you might notice, you will get an error message saying that sampling from the priors is not possible. This is due to the fact that brms by default uses flat priors. So, this generates this error message.

To get the priors used by brms we use the get_prior() command.

Code
get_prior(
  fluency ~ 1 + Occ2*FL + Occ2*MT + Occ3*FL + Occ3*MT + (1|student),
  data = Subtitles
)
                    prior     class      coef   group resp dpar nlpar lb ub
                   (flat)         b                                        
                   (flat)         b        FL                              
                   (flat)         b   FL:Occ3                              
                   (flat)         b        MT                              
                   (flat)         b   MT:Occ3                              
                   (flat)         b      Occ2                              
                   (flat)         b   Occ2:FL                              
                   (flat)         b   Occ2:MT                              
                   (flat)         b      Occ3                              
 student_t(3, 104.9, 6.1) Intercept                                        
     student_t(3, 0, 6.1)        sd                                    0   
     student_t(3, 0, 6.1)        sd           student                  0   
     student_t(3, 0, 6.1)        sd Intercept student                  0   
     student_t(3, 0, 6.1)     sigma                                    0   
       source
      default
 (vectorized)
 (vectorized)
 (vectorized)
 (vectorized)
 (vectorized)
 (vectorized)
 (vectorized)
 (vectorized)
      default
      default
 (vectorized)
 (vectorized)
      default

For all the beta’s (fixed effects) brms uses a flat prior. Actually, that is something that is better avoided. More appropriate would be to come up with our own priors. Let’s think about this. All the explanatory variables are dummy variables. So, they quantify differences between groups of observations (based on time or condition).

As we have no prior idea about the directions of the effects of condition nor of the effect of time, we could use a prior distribution centred around 0 (most probability assigned to no effect).

Next, we have to think about setting the width of the prior. For instance, if we use a normal distribution to express our prior belief, we have to think about the sd for the normal distribution that captures our prior belief. In our case, the sd has to be high enough to assign some probability to even very strong positive and negative effects. Here, it is important that we know our data well. I mean, we need to know the scale of our dependent variable. This variable has an sd of 7.0987359. So, now we can use Effect Sizes as a reference frame. Remember, an effect size of 0.8 (or higher) indicates a stron effect (Cohen’s d). So, on our scale of the fluency variable an effect of 5.7 (= 7.1 * 0.8) indicates a strong effect. Let’s use the value 5.7 as our sd for priors for the effects of our dummy variables. Visually the prior would look like this:

Code
# Setting a plotting theme

library(ggplot2)
library(ggtext) # to be able to change the fonts etc in graphs

theme_set(theme_linedraw() +
            theme(text = element_text(family = "Times", size = 8),
                  panel.grid = element_blank(),
                  plot.title = element_markdown())
)

Prior_betas <- ggplot( ) +
  stat_function(
    fun = dnorm,    # We use the normal distribution
    args = list(mean = 0, sd = 5.7), # 
    xlim = c(-15,15)
  ) +
  scale_y_continuous(name = "density") +
  labs(title = "Prior for the effects of independent variables",
       subtitle = "N(0,5.7)")

Prior_betas

Notice that even effects of -10 and 10 (almast effect sizes of -2 and 2) still get a decent amount of probability in our prior density function.

Let’s set these priors and try to apply a pp_check(). Notice that I set the priors for all slopes (class = "b") at once.

Code
Custom_prior <- c(
  set_prior(
    "normal(0,5.7)",
    class = "b"
  )
)

M3_priors <- brm(
  fluency ~ 1 + Occ2*FL + Occ2*MT + Occ3*FL + Occ3*MT + (1|student),
  data = Subtitles,
  cores = 4,
  backend = "cmdstanr",
  seed = 1975,
  prior = Custom_prior,
  sample_prior = "only"
)

pp_check(M3_priors)

The simulated data goes all the way! But it doesn’t generate extremely high or low observations and from this check we also learn that we have set quiet broad priors as they result in big differences between simulated datasets based on our model now.

Time to apply these priors (that we somehow understand now) to estimate the real model.

Code
Custom_prior <- c(
  set_prior(
    "normal(0,5.7)",
    class = "b"
  )
)

M3 <- brm(
  fluency ~ 1 + Occ2*FL + Occ2*MT + Occ3*FL + Occ3*MT + (1|student),
  data = Subtitles,
  cores = 4,
  backend = "cmdstanr",
  seed = 1975,
  prior = Custom_prior
  )

Subtask 2.2: did the model converge properly?

Perform different checks on the convergence of the model.


Possible solution

Let’s start by checking the trace-plots.

Code
library(ggmcmc)

Model_chains <- ggs(M3b)

Model_chains %>%
  filter(Parameter %in% c(
    "b_Intercept",
    "b_FL", 
    "b_MT", 
    "b_Occ2", 
    "b_Occ3",
    "b_FL:Occ2",
    "b_FL:Occ3",
    "b_MT:Occ2",
    "b_MT:Occ3"
    )
  ) %>%
  ggplot(aes(
    x   = Iteration,
    y   = value, 
    col = as.factor(Chain)))+
  geom_line() +
  facet_grid(Parameter ~ .,
             scale  = 'free_y',
             switch = 'y') +
  labs(title = "Caterpillar Plots for the parameters",
       col   = "Chains")

Looking at the trace-plots, we can conclude that all chains mixed very well. This is already a first indication of succesful convergence.

Next, we can check the R-hat statistic for each of the parameters. With the following plot we get a visual overview of all the R-hat statistics (notice the large number of parameters, because we also have random effects in our model):

Code
library(bayesplot)

mcmc_rhat(
  rhat(M3b), 
  size = 1) +
  yaxis_text(hjust = 1) # to print parameter names

None of the parameters shows a high R-hat statistic. They all are below the threshold of 1.05, indicating that all parameters converged well.

Time to get insight in the amount of autocorrelation. A first check is plotting the ratio of the number of Effective Sample Sizes to the Total Sampel Sizes for all the parameters. Remember that this ratio should be above 0.1 to be sure that the amount of autocorrelation is acceptable. Following code gives a visual overview of these ratios.

Code
mcmc_neff(
  neff_ratio(M3b)
  ) + 
  yaxis_text(hjust = 1)

From the plot we learn that for all the parameters the ratio is above 0.1. So, we can conclude that the amount of autocorrelation is not problematic for any of the parameters.


Subtask 2.3: does the posterior distribution histogram have enough information?

Check if the posterior distribution histograms of the different parameters are informative enough to substantiate our inferences.


Possible solution

To evaluate this, we create histograms based on the draws for our parameter models. We will apply this first for all main effects.

Code
library(patchwork)

posterior_PD <- as_draws_df(M3b)

post_intercept <- 
  posterior_PD %>%
  select(b_Intercept) %>%
  ggplot(aes(x = b_Intercept)) +
  geom_histogram() +
  ggtitle("Intercept") 

post_Occ2 <- 
  posterior_PD %>%
  select(b_Occ2) %>%
  ggplot(aes(x = b_Occ2)) +
  geom_histogram() +
  ggtitle("Beta Occ2") 

post_Occ3 <- 
  posterior_PD %>%
  select(b_Occ3) %>%
  ggplot(aes(x = b_Occ3)) +
  geom_histogram() +
  ggtitle("Beta Occ3") 

post_FL <- 
  posterior_PD %>%
  select(b_FL) %>%
  ggplot(aes(x = b_FL)) +
  geom_histogram() +
  ggtitle("Beta FL") 

post_MT <- 
  posterior_PD %>%
  select(b_MT) %>%
  ggplot(aes(x = b_MT)) +
  geom_histogram() +
  ggtitle("Beta MT") 


post_intercept + post_Occ2 + post_Occ3 + post_FL + post_MT +
  plot_layout(ncol = 3)

These plots show clear slopes and a peak, indicating that the posterior is informative enough for each of these parameters.

Now, let’s do the same for the interaction effects.

Code
post_Occ2_FL <- 
  posterior_PD %>%
  select(`b_Occ2:FL`) %>%
  ggplot(aes(x = `b_Occ2:FL`)) +
  geom_histogram() +
  ggtitle("Beta Occ2:FL") 

post_Occ2_MT <- 
  posterior_PD %>%
  select(`b_Occ2:MT`) %>%
  ggplot(aes(x = `b_Occ2:MT`)) +
  geom_histogram() +
  ggtitle("Beta Occ2:MT") 

post_Occ3_FL <- 
  posterior_PD %>%
  select(`b_FL:Occ3`) %>%
  ggplot(aes(x = `b_FL:Occ3`)) +
  geom_histogram() +
  ggtitle("Beta Occ3:FL") 

post_Occ3_MT <- 
  posterior_PD %>%
  select(`b_MT:Occ3`) %>%
  ggplot(aes(x = `b_MT:Occ3`)) +
  geom_histogram() +
  ggtitle("Beta Occ3:MT") 

post_Occ2_FL + post_Occ2_MT + post_Occ3_FL + post_Occ3_MT +
  plot_layout(ncol = 3)

Here the conclusion is the same. These histograms show no problematic cases.

Subtask 2.4: how well does the model predict the observed data?

Perform posterior predictive checks based on the model.


Possible solutions

Code
pp_check(M3b)

Subtask 2.5: what about prior sensitivity of the results?

Finally, we have to check if the results of our model are not to dependent on the priors we specified in the model.

3. Task 3

Now a more general task. Make different visualizations of the model results.

One of the possible vizualisations could be a rather complex one. Remember, there are 3 conditions and 3 occasions. What I like to see is a plot showing the expected means for each of the conditions on each of the 3 occasions.

And what do we learn about the progress between Occ1 and Occ2 in each of the groups?

References

Frumuselu, Anca Daniela, Sven De Maeyer, Vincent Donche, and María del Mar Gutiérrez Colon Plana. 2015. “Television Series Inside the EFL Classroom: Bridging the Gap Between Teaching and Learning Informal Language Through Subtitles.” Linguistics and Education 32 (December): 107–17. https://doi.org/10.1016/j.linged.2015.10.001.